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DETAILED ACTION 

1 . This action is in response to the amendment filed on 4/19/04. 

2. The objections to the drawings are withdrawn, in view of applicant's amendment. 

3. The objections to the specification are withdrawn, in view of applicant's 
amendment. 

4. The objection to claim 1 is withdrawn, in view of applicant's amendment. 

Claim Rejections - 35 USC § 102 

5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

Claims 1-4, 9-12, 17-19, 24-28 and 33-35 are rejected under 35 U.S.C. 102(b) as 
being anticipated by Carr et al. (Carr), "Compiler Optimizations for Improving Data 
Locality", 1994, ACM, p. 252-262. 

As per claim 1, Carr discloses a method, comprising: 

- identifying a loop and each vector memory reference in the loop, in a 
program (p. 253 col. L lines 61-62, "data dependence (is determined) between two 
arrays (vector memory references) ... (in a loop)"), 
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- determining dependencies between vector memory references in the loop, 
including determining unidirectional and circular dependencies (p. 253 col. L lines 
61-62, "data dependence (is determined) between two arrays (vector memory 
references) ... (in a loop)"), 

- reducing cache thrashing (p. 252 col. R lines 8-9, "Improve the order of 
memory accesses to exploit all levels of the memory hierarchy", and exploiting the 
cache portion of the memory hierarchy, is to use it efficiently, in its designed manner. A 
cache, operating efficiently in its designed manner of operation, is kept full of the most 
used memory access locations while cache thrashing is minimized, and p. 252 col. R 
lines 32-34, "We use the model to derive a loop structure which results in the fewest 
accesses to main memory (i.e. making the code access the cache and main memory in 
an efficient manner, thereby reducing cache thrashing)"), by distributing the vector 
memory references into a plurality of detail loops, wherein the vector memory 
references that have circular dependencies therebetween are included in a 
common detail loop, and wherein the detail loops are ordered according to the 
unidirectional dependencies between the memory references (p. 253 col. L lines 2- 
6, "applying compiler transformations based on data dependence (e.g., loop 
interchange, fusion, distribution, and tiling) to improve paging... In this paper, we ... 
integrate optimizations for parallelism and memory", and p. 253 col. L lines 61-62, "data 
dependence (is determined) between two arrays (vector memory references) ... (in a 
loop)", and p. 256 col. L lines 48-51, "Loop distribution separates independent 
statements in a single loop into multiple loops with identical headers. To maintain the 
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meaning of the original loop, statements in a recurrence (a cycle in the dependence 
graph) must be placed in the same (common detail) loop (and the detail loops must be 
ordered according to the unidirectional dependencies between the memory 
references)"). 

As per claim 2, the rejection of claim 1 is incorporated and further, Carr discloses 
allocating a plurality of temporary storage areas within a cache and determining 
the size of each temporary storage area based on the size of the cache and the 
number of temporary storage areas (p. 252 col. R lines 9-12, "loop ... distribution ... 
requires knowledge ... of the cache line size", and p. 252 col. R lines 14-15, 
"Knowledge of the cache size, associativity, and replacement policy is essential", and 
the optimization technique of loop distribution includes the allocation of a plurality of 
temporary storage areas within a cache and determining the size of each temporary 
storage area based on the size of the cache and the number of temporary storage 
areas). 

As per claim 3, the rejection of claim 1 is incorporated and further, Carr discloses 
a section loop including the plurality of detail loops (p. 256 col. L line 48, "Loop 
distribution"). 

As per claim 4, the rejection of claim 1 is incorporated and further, Carr discloses 
distributing the vector memory references into a plurality of detail loops further 
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comprises distributing the vector memory references into a plurality of detail 
loops that each contain at least one vector memory reference that could benefit 
from cache management (p. 253 col. L lines 2-6, "applying compiler transformations 
based on data dependence (e.g., loop interchange, fusion, distribution, and tiling) to 
improve paging"). 

As per claims 9-12 Carr also discloses such claimed limitations as addressed in 
claims 1-4 above, respectively. 

As per claim 17 Carr discloses a method, comprising: 

- identifying a loop in a program (p. 253 col. L lines 61-62, "data dependence 
(is determined) between two arrays (vector memory references) ... (in a loop)"), 

- identifying each vector memory reference in the loop determining 
dependencies between vector memory references in the loop(p. 253 col. L lines 61- 
62, "data dependence (is determined) between two arrays (vector memory references) 
... (in a loop)"); and 

- reducing cache thrashing (p. 252 col. R lines 8-9, "Improve the order of 
memory accesses to exploit all levels of the memory hierarchy", and exploiting the 
cache portion of the memory hierarchy, is to use it efficiently, in its designed manner. A 
cache, operating efficiently in its designed manner of operation, is kept full of the most 
used memory access locations while cache thrashing is minimized, and p. 252 col. R 
lines 32-34, "We use the model to derive a loop structure which results in the fewest 
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accesses to main memory (i.e. making the code access the cache and main memory in 
an efficient manner, thereby reducing cache thrashing)"), by distributing the vector 
memory references into a plurality of detail loops in response to cache behavior 
and the dependencies between the vector memory references in the loop (p. 256 

col. L lines 48-51, "Loop distribution separates independent statements in a single loop 
into multiple loops with identical headers", and p. 252 col. R lines 14-15, "Knowledge of 
the cache size, associativity, and replacement policy (i.e. cache behavior) is essential"). 

As per claim 18, the rejection of claim 17 is incorporated and further, Carr 
discloses determining dependencies between vector memory references in the 
loop, and wherein distributing the loop includes distributing the vector memory 
references into the plurality of detail loops, wherein the vector memory 
references that have circular dependencies therebetween are included in a 
common detail loop (p. 253 col. L lines 61-62, "data dependence (is determined) 
between two arrays (vector memory references) ... (in a loop)", and p. 256 col. L lines 
48-51, "Loop distribution separates independent statements in a single loop into multiple 
loops with identical headers. To maintain the meaning of the original loop, statements 
in a recurrence (a cycle in the dependence graph) must be placed in the same loop"). 

As per claim 19, the rejection of claim 17 is incorporated and further, Carr 
discloses determining dependencies between vector memory references in the 
loop, and wherein distributing the loop includes distributing the vector memory 
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references into the plurality of detail loops, wherein the vector memory 
references that have circular dependencies therebetween are included in a 
common detail loop (p. 253 col. L line s 61-62, "data dependence (is determined) 
between two arrays (vector memory references) ... (in a loop)", and p. 256 col. L lines 
48-51, "Loop distribution separates independent statements in a single loop into multiple 
loops with identical headers. To maintain the meaning of the original loop, statements 
in a recurrence (a cycle in the dependence graph) must be placed in the same loop"). 

As per claims 24-28 Carr also discloses such claimed limitations as addressed in 
claims 1-3 above. 

As per claim 33 Carr discloses a method for reducing the likelihood of cache 
thrashing by software to be executed on a computer system having a cache (p. 

252 col. R lines 8-9, "Improve the order of memory accesses to exploit all levels of the 
memory hierarchy", and exploiting the cache portion of the memory hierarchy, is to use 
it efficiently, in its designed manner. A cache, operating efficiently in its designed 
manner of operation, is kept full of the most used memory access locations while cache 
thrashing is minimized, and p. 252 col. R lines 32-34, "We use the model to derive a 
loop structure which results in the fewest accesses to main memory (i.e. making the 
code access the cache and main memory in an efficient manner, thereby reducing 
cache thrashing)"), comprising: executing the software on the computer system; 
generating a profile indicating the manner in which the software uses the cache; 
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identifying a portion of the software using the profile data that may experience 
cache thrashing; and modifying the identified portion of the software to reduce 
the likelihood of cache thrashing (p. 253 col. L lines 61-62, "data dependence (i.e. a 
portion of the software that may experience cache thrashing) (is identified) between two 
arrays ... (in a loop)", and p. 253 col. L lines 2-6, "compiler transformations (are applied) 
based on data dependence (e.g., loop interchange, fusion, distribution, and tiling) to 
improve paging (to reduce the likelihood of cache thrashing)... In this paper, we ... 
integrate optimizations for parallelism and memory"). 

As per claim 34, the rejection of claim 33 is incorporated and further, Carr 
discloses modifying the identified portion of the software to reduce the likelihood 
of cache thrashing further comprises: identifying a loop in the identified portion 
of the software; identifying each vector memory reference in the identified loop; 
determining dependencies between the vector memory references in the 
identified loop of the software, including determining unidirectional and circular 
dependencies; and distributing the vector memory references into a plurality of 
detail loops, wherein the vector memory references that have circular 
dependencies therebetween are included in a common detail loop, and wherein 
the detail loops are ordered according to the unidirectional dependencies 
between the memory references (p. 253 col. L lines 2-6, "applying compiler 
transformations based on data dependence (e.g., loop interchange, fusion, distribution, 
and tiling) to improve paging", and p. 253 col. L lines 61-62, "data dependence (is 
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determined) between two arrays (vector memory references) ... (in a loop)", and p. 256 
col. L lines 48-51, "Loop distribution separates independent statements in a single loop 
into multiple loops with identical headers. To maintain the meaning of the original loop, 
statements in a recurrence (a cycle in the dependence graph) must be placed in the 
same (detail) loop (and the detail loops must be ordered according to the unidirectional 
dependencies between the memory references)"). 

As per claim 35, this is another method version of the claimed method discussed 
above, in claim 33, wherein all claimed limitations have also been addressed above. 

Claim Rejections - 35 USC § 103 

6. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

Claims 5-8, 13-16, 20-23 and 29-32 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Carr et al. (Carr), "Compiler Optimizations for Improving Data 
Locality", 1994, ACM, p. 252-262 in view of Mahadevan et al. (Mahadevan) U.S. Patent 
No. 5,797,013. 
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As per claim 5, the rejection of claim 1 is incorporated and further, Carr doesn't 
explicitly disclose inserting cache management instructions into at least one of 
said detail loops to control movement of data associated with the vector memory 
reference between a cache and main memory. 

However, Mahadevan, in an analogous environment, discloses inserting cache 
management instructions into loops to control movement of data associated with 
the vector memory reference between a cache and main memory (col. 6 lines 54- 
55, "(the compiler can) insert prefetches and effect other optimizations (cache 
management instructions) into the ... loop code"). 

Therefore, it would have been obvious to a person of ordinary skill in the art, at 
the time the invention was made, to incorporate the teachings of Mahadevan into the 
system of Carr to have cache management instructions inserted into detail loops to 
control movement of data associated with the vector memory reference between a 
cache and main memory. The modification would have been obvious because one of 
ordinary skill in the art would want to compile the code using techniques that will 
maximize the efficiency of the compiled code's cache usage and therefore overall 
operation. 

As per claim 6, the rejection of claim 1 is incorporated and further, Carr doesn't 
explicitly disclose inserting prefetch instructions into at least one of said detail 
loops to control movement of data associated with the vector memory reference 
between a cache and main memory. 



Application/Control Number: 09/785,143 Page 1 1 

Art Unit: 2122 

However, Mahadevan, in an analogous environment, discloses inserting 
prefetch instructions into loops to control movement of data associated with the 
vector memory reference between a cache and main memory (col. 6 lines 54-55, 
"(the compiler can) insert prefetches and effect other optimizations into the ... loop 
code"). 

Therefore, it would have been obvious to a person of ordinary skill in the art, at 
the time the invention was made, to incorporate the teachings of Mahadevan into the 
system of Carr to have prefetch instructions inserted into detail loops to control 
movement of data associated with the vector memory reference between a cache and 
main memory. The modification would have been obvious because one of ordinary skill 
in the art would want to compile the code using techniques that will maximize the 
efficiency of the compiled code's cache usage and therefore overall operation. 

As per claim 7, the rejection of claim 1 is incorporated and further, Carr doesn't 
explicitly disclose performing loop unrolling on at least one of said detail loops to 
control movement of data associated with the vector memory reference between 
a cache and main memory. 

However, Mahadevan, in an analogous environment, discloses performing loop 
unrolling on loops to control movement of data associated with the memory 
reference between a cache and main memory (col, 6 line 27, "the compiler unrolls 
loops"). 
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Therefore, it would have been obvious to a person of ordinary skill in the art, at 
the time the invention was made, to incorporate the teachings of Mahadevan into the 
system of Carr to have loop unrolling performed on at least one of said detail loops to 
control movement of data associated with the vector memory reference between a 
cache and main memory. The modification would have been obvious because one of 
ordinary skill in the art would want optimize the performance of the compiled code. 

As per claim 8, the rejection of claim 1 is incorporated and further, Carr doesn't 
explicitly disclose inserting at least one of a prefetch instruction and a cache 
management instruction into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory, and performing loop unrolling on at least one of said detail 
loops to control movement of data associated with the vector memory reference 
between a cache and main memory. 

However, Mahadevan, in an analogous environment, discloses inserting a 
prefetch instruction and a cache management instruction into loops to control 
movement of data associated with the memory reference between a cache and 
main memory, and performing loop unrolling on loops to control movement of 
data associated with the memory reference between a cache and main memory 
(col. 6 lines 54-55, "(the compiler can) insert prefetches and effect other optimizations 
(cache management instructions) into the ... loop code", and col. 6 line 27, "the 
compiler unrolls loops"). 
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Therefore, it would have been obvious to a person of ordinary skill in the art, at 
the time the invention was made, to incorporate the teachings of Mahadevan into the 
system of Carr to have the insertion of at least one of a prefetch instruction and a cache 
management instruction into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main memory, 
and performing loop unrolling on at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main memory 
The modification would have been obvious because one of ordinary skill in the art would 
want to gain the performance advantages provided by using these optimization 
techniques in combination (Mahadevan, col. 6 line 22 - col. 7 line 29). 

As per claims 13-16, 20-23 and 29-32, the Carr/Mahadevan combination also 
discloses such claimed limitations as addressed in claims 5-8 above. 

Response to Arguments 

8. Applicant's arguments filed 4/19/04, with respect to claims 1,9, 17, 24, 25, 26, 33 
and 35, have been fully considered but they are not persuasive. 

In the remarks, the applicant has argued substantially that: 

1 ) Carr does not suggest the desirability to reduce cache thrashing by distributing 
the vector memory references into a number of detail loops. 
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Examiner response: 

1 ) Carr does, in fact, suggest the desirability to reduce cache thrashing by 
distributing the vector memory references into a number of detail loops. He suggests 
the desirability to reduce cache thrashing by stating the desire to "improve the order of 
memory accesses to exploit all levels of the memory hierarchy", on p. 252 col. R lines 8- 
9, and exploiting the cache portion of the memory hierarchy, is to use it efficiently, in its 
designed manner. A cache, operating efficiently in its designed manner of operation is 
kept full of the most used memory access locations while cache thrashing is minimized, 
and on p. 252 col. R lines 32-34, he discloses that "we use the model to derive a loop 
structure which results in the fewest accesses to main memory (i.e. making the code 
access the cache and main memory in an efficient manner, thereby reducing cache 
thrashing)". 

Further, In response to applicant's argument that Carr does not suggest the 
desirability to reduce cache thrashing by distributing the vector memory references into 
a number of detail loops, the fact that applicant has recognized another advantage 
which would flow naturally from following the suggestion of the prior art cannot be the 
basis for patentability when any differences would otherwise be obvious. 

In the remarks, the applicant has argued substantially that: 

2) Carr discloses improving data locality which is in contrast to the applicants 
disclosure of reducing cache thrashing. 
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Examiner response: 

2) The examiner disagrees with how the applicant characterizes the prior art's 
teachings, in that, the prior art's technique of improving data locality, also reduces 
cache thrashing. In the prior art, Carr discloses compiling a program in a way that 
ensures that the programs memory accesses will use the cache efficiently. A cache 
that is operating efficiently, operates in a way that minimizes the eviction of needed data 
from the cache (i.e. the cache experiences minimal cache thrashing). 

The reference defines data locality as "the property that references to the same 
memory location or adjacent locations are reused within a short period of time", on p. 
252, col. L lines 30-32. Improving data locality by using the compiler to optimize code 
will ensure that references to the same or adjacent memory locations are located in the 
cache at the same time, thereby reducing the likelihood of cache thrashing. 



Conclusion 

9. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
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shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Andre R. Fowlkes whose telephone number is (703)305- 
8889. The examiner can normally be reached on Monday - Friday, 8:00am-4:30pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Tuan Q. Dam can be reached on (703)305-4552. The fax phone number for 
the organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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